Evaluating several unsupervised class-selection methods

نویسنده

  • Erika Johana Salazar
چکیده

In knowledge discovery from collected databases, one of the firstly arising questions is ”what should be discovered”. Two lines of work can be followed. In the first line, unsupervised learning is performed, usually clustering data, followed by a characterization of the discovered knowledge. In the second line, classifiers are constructed for each highly important feature registered in the database. In the latter approach, the selection of the important features is done by using domain background knowledge provided by human experts in the domain where the data was collected from. Yet, in actual domains, high number of features make difficult to select the important features for which the classifiers should be constructed. In this study, several measures for ranking class-candidate features are proposed and preliminary evaluated on a domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Partial Order Preserving Unsupervised Feature Selection on Networks

In the past decade, research on network data has attracted much attention and many interesting phenomena have been discovered. Such data are often characterized by high dimensionality but how to select meaningful and more succinct features for network data received relatively less attention. In this paper, we investigate unsupervised feature selection problem on networks. To effectively incorpo...

متن کامل

A Nonlinear Mixture Model based Unsupervised Variable Selection in Genomics and Proteomics

Typical scenarios occurring in genomics and proteomics involve small number of samples and large number of variables. Thus, variable selection is necessary for creating disease prediction models robust to overfitting. We propose an unsupervised variable selection method based on sparseness constrained decomposition of a sample. Decomposition is based on nonlinear mixture model comprised of test...

متن کامل

Constraint Score: A new filter method for feature selection with pairwise constraints

Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form ...

متن کامل

Discriminative Clustering by Regularized Information Maximization

Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier comple...

متن کامل

Evaluating the Effectiveness of Supervised and Unsupervised Classification Methods in Monitoring Regs (Case Study: Jazmourian Reg)

Due to its mobility and ability to move and its direct impact on residential areas and various developmental activities, the Ergs are of major importance in the desert areas, so monitoring of those is very important. Considering that the use of supervised and unguarded methods is considered as one of the most common methods in determining and monitoring land uses, in this research, the accuracy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001